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Community Detection in Random Networks 

Ery Arias-Castro^ and Nicolas Verzelen^ 



We formalize the problem of detecting a community in a network into testing whether in a given 
(random) graph there is a subgraph that is unusually dense. We observe an undirected and un- 
weighted graph on N nodes. Under the null hypothesis, the graph is a realization of an Erdos-Renyi 
CO ' graph with probability pQ. Under the (composite) alternative, there is a subgraph of n nodes where 

^ ■ the probability of connection is pi > po- We derive a detection lower bound for detecting such a 

! subgraph in terms of N, n,pQ,pi and exhibit a test that achieves that lower bound. We do this both 

when pq is known and unknown. We also consider the problem of testing in polynomial-time. As 
D ■ an aside, we consider the problem of detecting a clique, which is intimately related to the planted 

clique problem. Our focus in this paper is in the quasi-normal regime where npo is either bounded 
OO ' away from zero, or tends to zero slowly. 
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I — 1 Introduction 

' In recent years, the problem of detecting communities in networks has received a large amount of at- 

Q^ ■ tention, with important applications in the social and biological sciences, among others (Fortunato, 

g ; 2010). The vast majority of this expansive literature focuses on developing realistic models of 

I (random) networks (Albert and Barabasi, 2002; Barabasi and Albert, 1999), on designing meth- 

■ ods for extracting communities from such networks (Girvan and Newman, 2002; Newman, 2006; 
, Rcichardt and Bornholdt, 2006) and on fitting models to network data (Bickel et al., 2011). 

I The underlying model is that of graph Q = {£,V), where S is the set of edges and V is 

. . ' the set of nodes. For example, in a social network, a node would represent an individual and 

■ an edge between two nodes would symbolize a friendship or kinship of some sort shared by 
^ I these two individuals. In the literature just mentioned, almost all the methodology has con- 
centrated on devising graph partitioning methods, with the end goal of clustering the nodes in 
V into groups with strong inner-connectivity and weak inter-connectivity (Bickcl and Chen, 2009; 
Lancichinetti and Fortunato, 2009; Newman and Girvan, 2004). 

In this euphoria, perhaps the most basic problem of actually detecting the presence of a com- 
munity in an otherwise homogeneous network has been overlooked. From a practical standpoint, 
this sort of problem could arise in a dynamic setting where a network is growing over time and 
monitored for clustering. From a mathematical perspective, probing the limits of detection (i.e., 
hypothesis testing) often offers insight into what is possible in terms of extraction (i.e., estimation). 

Many existing community extraction methods can be turned into community detection proce- 
dures. For example, one could decide that a community is present in the network if the modularity 
of Newman and Girvan (2004) exceeds a given threshold. To set this threshold, one needs to de- 
fine a null model. Newman and Girvan (2004) implicitly assume a random graph conditional on 
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the node degrees. Here, we make the simplest assumption that the null model is an Erdos-Renyi 
random graph (Bollobas, 2001). 

In this context, we also touch on another line of work, that of detecting a clique in a random 
graph — the so-called Planted (or Hidden) Clique Problem (Alon et al., 1998; Dekel et al., 2011; 
Feige and Ron, 2010). Although the emphasis there is to find the detection performance of compu- 
tationally tractable algorithms, we mostly ignore computational consideration and simply establish 
the absolute detection limits of any algorithm whatsoever. 

1.1 The framework 

We address a stylized community detection problem, where the task is to detect the presence 
of clustering in the network and is formalized as a hypothesis testing problem. We observe an 
undirected graph G = {£,V) with := |V| nodes. Without loss of generality, we take V = [A^] := 
{1, . . . , A^}. The corresponding adjacency matrix is denoted W G {0, where Wij = 1 if, 

and only if, S £, meaning there is an edge between nodes i,j S V. Note that W is symmetric, 
and we assume that Wa = for all i. Under the null hypothesis, the graph ^ is a realization of 
G{N,pq), the Erdos-Renyi random graph on A^ nodes with probability of connection po € (0, 1); 
equivalently, the upper diagonal entries of W are independent and identically distributed with 
F{Wij = 1) = po for any i ^ j. Under the alternative, there is a subset of nodes indexed by 5 C V 
such that F{Wij = 1) = pi for any i,j £ S with i ^ j, with everything else the same. We assume 
that pi > Po, implying that the connectivity is stronger between nodes in S. When pi = 1, the 
subgraph with node set S* is a clique. The subset S is not known, although in most of the paper 
we assume that its size n := |5| is known. 

We study detectability in this framework in asymptotic regimes where n,N — )• oo, and po,pi 
may also change; all these parameters are assumed to be functions of A^. A test T is a function 
that takes W as input and returns T = 1 to claim there is a community in the network, and T = 
otherwise. The (worst-case) risk of a test T is defined as 

7^(r) = Po(T = 1) + max FgiT = 0), 

\S\=n 

where Pq is the distribution under the null and ¥3 is the distribution under the alternative where 
5 indexes the community. We say that a sequence of tests (T/v) for a sequence of problems (Wn) 
is asymptotically powerful (resp. powerless) if 7Ar(r/v) — )• (resp. — )■ 1). Practically speaking, a 
sequence of tests is asymptotically powerless if it does not perform substantially better than any 
guessing that ignores the adjacency matrix W. We will often speak of a test being powerful or 
powerless when in fact referring to a sequence of tests and its asymptotic power properties. 

1.2 Closely related work 

We take the beaten path, following the standard approach in statistics for analyzing such composite 
hypothesis testing problems, in particular, the work of Ingster (1997) and others (Donoho and Jin, 
2004; Hall and Jin, 2010; Ingster and Suslina, 2002) on the detection of a sparse (normal) mean 
vector. Most closely related to our work is that of Butucca and Ingster (2011). Specializing their 
results to our setting, they derive lower bounds and upper bounds for the same detection problem 
when the graph is directed and the probability of connection under the null (denoted pq) is fixed, 
which is a situation where the graph is extremely dense. Their work leaves out the interesting 
regime where po — )■ 0, which leads to a null model that is much more sparse. 
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1.3 Main Contribution 

Our main contribution in this paper is to derive a sharp detection boundary for the problem of 
detecting a community in a network as described above. We focus here on the quasi-normal regime"^ 
where npQ is either bounded away from zero, or tends to zero slowly, specifically, 



On the one hand, we derive an information theoretic bound that applies to all tests, meaning 
conditions under which all tests are powerless. On the other hand, we display a test that basically 
achieves the best performance possible. The test is the combination of the two natural tests that 
arise in Butucea and Ingster (2011) and much of the work in that field (Arias-Castro et al., 2011; 
Ingster et al, 2010): 

• Total degree test. This test rejects when the total number of edges is unusually large. This is 
global in nature in that it cannot be directly turned into a method for extraction. 

• Scan (or maximum modularity) test. This test amounts to turning modularity into a test 
statistic by rejecting when its maximum value is unusually large. It is strictly speaking the 
generalized likelihood ratio test under our framework. 

We also consider the situation, common in practice, where po is unknown. Interestingly, the 
detection boundary becomes larger than in the former setting when n is moderately sparse. We 
derive the corresponding lower bound in this situation and design a test that achieves this bound. 
The test is again the combination of the two tests: 

• Degree variance test. This test is based on the differences between two estimates for the 
degree variance, an analysis of variance of sorts. (Note that the total degree test cannot be 
calibrated without knowledge of po-) 

• Scan test. This test can be calibrated in various ways when po is unknown, for example by 
estimation of po based on the whole graph, or by permutation. We study the former. 

Finally, we consider various polynomial-time algorithms, the main one being a convex relaxation 
of the scan test based on a sparse eigenvalue problem formulation. Our inspiration there comes 
from the recent work of Berthet and Rigollet (2012). We discuss the discrepancy between the 
performances of the scan test and the relaxed scan test and compare it with other polynomial-time 
tests. 

We summarize our findings in Tables 1 and 2, where 





R = 



V^iPi - Po) 



\/po(l -Po) 



is (up to y'ri/2 factor) the SNR for detecting the dense subgraph when it is known. 



^The quasi-Poisson regime where npo polynomially fast is qualitatively different and necessitates different 
proof arguments. This is beyond the scope of this paper and will appear somewhere else. 
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Table 1: Detection boundary and near-optimal algorithms. For any sequence a and b going to 
infinity, a <^ b (resp. a b) means that there exists e > arbitrarily small such that a < b^~'' 
(resp. a > 6^"*"^) 





Po known 


Pq unknown 










PO » 

« «) 


R > 2^1og(iV/n) 

21og(Af/n) 


R > N/n^^^ 
R > N/n^/^ 


R > 2y/log{N/n) 

21og(Af/ri) 


i? > N^/^/n 

R > 7V3/4/„ 










Scan test 


Tot. Deg. test 


Scan test 


Dec. Var. test 



Table 2: Polynomial time algorithms 



Po known 


Pq unknown 


n Vn 


n y/N 


n ^ ViV 


n y/N 




R > N/iv^/'^ 




R > N^/'^/n 


R > 2y/N logN 


R > 2^NlogN 


Relax. Scan test 


Tot. Dec. test 


Relax. Scan test 


Dec. Var. test 



1.4 Finding a clique 

We start the paper by addressing the problem of detecting the presence of a large clique in the 
graph, and treat it separately, as it is an interesting case in its own right. It is simpler and allows 
us to focus on the regime where n/ log N ^ oo m. the rest of the paper. We establish a lower bound 
and prove that the following (obvious) test achieves that bound: 

• Clique number test. This tests rejects when the size of the clique number of the graph is 
unusually large. It can be calibrated without knowledge of for example by permutation, 
but we do not know of a polynomial-time algorithm that comes even close. 

1.5 Content 

In Section 2, we consider the problem of detecting the presence of a large clique and analyze 
the clique number test. In Section 3, we consider the more general problem of detecting a densely 
connected subgraph and analyze the total degree test and the scan test. The more realistic situation 
of unknown po is handled in Section 4. In Section 5.2, we investigate polynomial-time tests. We then 
discuss our results and the outlook in Section 6. The technical proofs are postponed to Section 7. 

1.6 General assumptions and notation 

We assume throughout that N ^ oo and the other parameters n,pQ,pi (and more) are allowed 
to change with N, unless specified otherwise. This dependency is left implicit. In particular, we 
assume that n/N — >• 0, emphasizing the situation where the community to be detected is small 
compared to the size of the whole network. (When n is of the same order as N, the total degree 



5 



test is basically optimal.) We assume that pQ is bounded away from 1, which is the most interesting 
case by far, and that N^po — )• oo, the latter implying that the number of edges in the network 
(under the null) is not bounded. We also hypothesize that either pi = 1 or n — t- oo with n'^pi — t- oo, 
there is a non- vanishing chance that the community does not contain any edges, precluding any 
test to be powerful. 

We use standard notation such as CLji ~ bn when dn/bn — ^ 1; dn — o(6j^) when CLj^/bn — ^ 0; 
a„ = 0{bn) when an/bn is bounded; an ^ bn when a„ = 0{bn) and 6„ = 0(a„); a„ ■< bn when 
there exists a positive constant C such that < Cbn and a„ >~ bn when there exists a positive 
constant C such that a„ > Cbn- For an integer n let n^^^ = n{n — l)/2. For two distributions Li 
and L2 on the real line, let Li * L2 denote their convolution, which is the distribution of the sum 
two independent random variables Xi ~ L\ and X2 ~ L2. 

Because of its importance in describing the tails of the binomial distribution, the following 
function — which is the relative entropy or Kullback-Leibler divergence of Bern(g) to Bern(p) — 
will appear in our results: 



2 Detecting a large clique in a random graph 

We start with specializing the setting to that of detecting a large clique, meaning we consider the 
special case where pi = 1. In this section, n is not necessarily increasing with N . 

2.1 Lower bound 

We establish the detection boundary, giving sufficient conditions for the problem to be too hard 
for any test, meaning that all tests are asymptotically powerless. 

Theorem 1. All tests are asymptotically powerless if 



The result is, in fact, very intuitive. Condition (3) implies that, with high probability under 
the null, the clique number is at least n, which is the size of the implanted clique under the 
alternative. This is a classical result in random graph theory, and finer results are known — see 
(Bollobas, 2001, Chap. 11). The arguments underlying Theorem 1 are, however, based on studying 
the likelihood ratio test when a uniform prior is assumed on the implanted clique S, which is the 
standard approach in detection settings; see (Lehmann and Romano, 2005, Ch. 8). In this specific 
setting, the second moment method — which consists in showing that the variance of the likelihood 
ratio tends to — suffices. 

2.2 The clique number test 

Computational considerations aside, the most natural test for detecting the presence of a clique is 
the clique number test defined in the Introduction. We obtain the following. 

Proposition 1. The clique number test is powerful if 






(3) 




(4) 
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The proof is enth'ely based on the fact that, when (4) holds, the chque number under the nuh 
is at most n — 1 with high probabihty (Bollobas, 2001, Th. 11.6), while it is at least n under the 
alternative. (Thus the proof is omitted.) We conclude that the clique number test is seen to achieve 
the detection boundary established in Theorem 1. 



3 Detecting a dense subgraph in a random graph 

We now consider the more general setting of detecting a dense subgraph in a random graph. 
We start with an information bound that applies to all tests, regardless of their computational 
requirements. We then study the total degree test and the scan test, showing that the test that 
combines them with a simple Bonferroni correction is essentially optimal. 



3.1 Lower bound 

When assuming infinite computational power, what is left is the purely statistical challenge of 
detecting the subgraph. For simplicity, we assume that n is not too small, specifically, 

^ oo, (5) 



logA^ 



though our result below partially extends to this, particularly when pi is constant. As usual, a 
minimax lower bound is derived by choosing a prior over the composite alternative. Assuming 
that po and pi are known, because of symmetry, the uniform prior over the community S is least 
favorable, so that we consider testing 

Hq -.g G{N,po) versus Hi : Q G{N,pQ;n,pi), (6) 

where the latter is the model where the community S is chosen uniformly at random among subset 
of nodes of size n, and then for i ^ j, F{Wij = 1) = pi if i,j £ S, while ¥(Wij = 1) = po otherwise. 
For this simple versus simple testing problem, the likelihood ratio test is optimal, which is what 
we examine to derive the following lower bound. Remember the entropy function defined in (2). 

Theorem 2. Assuming (5) and (1) hold, all tests are asymptotically powerless if 

^i^-^0 (7) 

and 

nH(p-\) , , 

limsup— — -^^<\. (8) 

21og(Af/n) ^ ' 

Conditions (7) and (8) have their equivalent in the work of Butucea and Ingster (2011). That 
said, (8) is more complex here because of the different behaviors of the entropy function according 
to whether pi/po is small or large — corresponding to the difference between large deviations and 
moderate deviations of the binomial distribution. Only in the case where pi/po — s- 1 is the normal 
approximation to the binomial in effect. 

To better appreciate (8), note that it is equivalent to 

{Pi-Pof / 1 1, npo 

limsup- — 7- 7 — , < 1, when — , — ^ oo: (9) 

^4po(l-po) log(A^/n) log(iV/n) 
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and 

r Pi n [\og{N/n)\ npo , . 

limsup — — , log < 1, when — , — ;> 0. (10) 

^2(l-po) logWn) npo J log{N/n) ^ ' 

In (9), npo is larger and only the moderate deviations of the binomial distribution are involved, 
while in (10), npQ is smaller and the large deviations come into play. 

Theorem 2 happens to be sharp because, as we show next, the test that combines the total 
degree test and the scan test is asymptotically powerful when the conditions (7) and (8) are — 
roughly speaking — reversed. 

3.2 The total degree test 

The total degree test rejects for large values of 

W:= Yl ^M- (11) 

l<i<j<N 

The resulting test is exceedingly simple to analyze, since 

T^~Bin(iV(2) _^(2)^^^^) ^ Bin(n(2),pi). (12) 

Proposition 2. The total degree tests is powerful if 



Pi -Po 
To N 



oo. (13) 

It is equally straightforward to show that the total degree has risk strictly less than one — 
meaning has some non-negligible power — when the same ratio tends to a positive and finite 
constant, while it is asymptotically powerless when that ratio tends to zero. 

3.3 The scan test 

The scan test is another name for the generalized likelihood ratio test, and corresponds to the test 
that is based on the maximum modularity. It is particularly simple when pQ is known, as it rejects 
for large values of 

VFr;, := max 1^5, Ws := V W,^j. (14) 

\S\=n . . 

Unlike the total degree (11), the scan statistic (14) has an intricate distribution as the partial 
sums Ws are not independent. Nevertheless, the union bound and standard tail bounds for the 
binomial distribution lead to the following result. 

Proposition 3. The scan test is powerful if 

Iiminf-^^%L>1. (15) 
21og(iV/n) ^ ' 

3.4 The combined test 

Having studied these two tests individually, we are now in a position to consider them together, by 
which we mean a simple Bonferroni combination which rejects when either of the two tests rejects. 
Looking back at our lower bound and the performance bounds we established for these tests, we 
come to the following conclusion. When the limit in (7) is infinite — yielding (13) — then the total 
degree test is asymptotically powerful by Proposition 2. When the limit inferior in (8) exceeds one 
— yielding (15) — then the scan test is asymptotically powerful by Proposition 3. 
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3.5 Adaptation to unknown n 

The scan statistic in (14) requires knowledge of n. When this is unknown, the common procedure 
is to combine the scan tests at all different sizes n using a simple Bonferroni correction. This is 
done in (Butucca and Ingstcr, 2011), with the conclusion that the resulting test is essentially as 
powerful as the individual tests. It is straightforward to see that, here too, the tail bound used in 
the proof of Proposition 3 allows for enough room to scan over all subgraphs of all sizes. 



4 When po is unknown: the fixed expected total degree model 

Although it leads to interesting mathematics, the setting where po is known is, for the most part, 
impractical. In this section, we evaluate how not knowing po changes the difficulty of the problem. 
In fact, it makes the problem strictly more difficult in the denser regime. 

There are (at least) two ways of formalizing the situation where po is unknown. In the first 
option, we still consider the exact same hypothesis testing problem, but maximize the risk over 
relevant subsets of po's and pi's, since now even the null hypothesis is composite. In the second 
option — which is the one we detail — for a given pair of probabilities < pg < pi < 1, we consider 
testing 

Ho -.g ^ G{N,po) versus H[ : Q G(iV,Po; n,pi), po := p'o + (pi - p'o)j;^- (16) 

Note that, in this setting, we still assume that po,pi,n are known to the statistician. By design, 
the graph has the same expected total degree under the null and under the alternative hypotheses, 
that is we have 

Eo{W) = N^^^Po + n^^\l-po)=K's{W), V5 : |S| = n, 

where P'^ and Eg denote the probability distribution and corresponding expectation under the 
model where, for any i ^ j, F{Wij = 1) = pi if i,j G 5, while ¥(Wij = 1) = p'q otherwise. 
The risk of a test T for this problem is defined as 

7^(r) = Po(r = 1) + max ¥'s{T = 0) . 

\S\=n 

We say that the a sequence of tests (Tjv) is asymptotically powerful for the problem with fixed 
expected total degree (resp. powerless) if 7^(T/v) — )■ (resp. 7^(T/v) — )• 1). 

We first compute the detection boundary for this problem and then exhibit some tests achieving 
this detection boundary. Interestingly, these tests do not require the knowledge of po and pi, or 
even n, so that they can be used in the original setting (6) when these parameters are unknown. 

4.1 Lower bound 

Theorem 3. Assuming (5) holds and that 



log (1 V ^ ) 

all tests are asymptotically powerless for the problem (16) if 












log 











(17) 
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and 



nH„i ipi) 

limsup ^^ X / \ < 1- (19) 



Comparing with Theorem 2, where po is assumed to be known, the condition (18) is substantially 
weaker than the corresponding condition (7), while we shall see in the proof that (19) is comparable 
to (8). That said, when -n? < N, the entropy condition (8) is a stronger requirement than either 
(7) or (18), implying that the setting where po is known and the setting where unknown are 
asymptotically as difficult in that case. 



4.2 Degree variance test 

By construction, the total degree W has the same expectation under the null and under the al- 
ternative in the testing problem with fixed expected total degree — and same variance also up to 
second order — making it difficult to see how to fruitfully use this statistic in this context. 

We design instead a test based on comparing the two estimators for the node degree variance, 
not unlike an analysis of variance. Let 

denote the degree of node i in the whole network. The first estimate is simply the maximum 
likelihood estimator under the null 

iV(2) w 



V^ = i^-'^)^^ 7Po(l-Po), po: 



The second estimator is some sort of sample variance, modified to account for the fact that the Wj. 
are not independent 

1 ^ 

^2 = ]v3^E(^--(^-l)^0)'- 

i=l 

Both estimators are unbiased for the degree variance under the null, meaning, Eq Vi = Eq V2 = 
(A^ — 1)pq{1 — pq). Under the alternative, V2 tends to be larger than Vi, leading to a test that 
rejects for large values of 

y*:=^^, V:=V2-Vi. (21) 
VNpo 

Proposition 4. Assume that pQ >- 1/N. The degree variance test is asymptotically powerful under 
fixed expected total degree if 

(Pi -P'o? 



P'q A^3/2 



00 (22) 



The test based on V* achieves the part (18) of the detection boundary. We note that computing 
V* does not require knowledge of po, pi or n, and in fact, its calibration can be done without any 
knowledge of these parameters via a form of parametric bootstrap, as we do for the scan test below. 



4.3 The scan test 

When Pq is not available a priori, we have at least three options: 
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Estimate po- We replace po with its maximum likelihood estimator under the null, i.e., 
Po = W/N^'^\ and then compare the magnitude of the observed scan statistic (14) with what 
one would get under a random graph model with probability of connection equal to po- 

Generalized likelihood ratio test. We simply implement the actual generalized likelihood ratio 
test (Kulldorff, 1997), which rejects for large values of 



max 

\S\=n 



where h{p) := plogp + (1 — p) log(l — p), po as above, and 

_ Ws . _ W-Ws 

P^'S '— P^'S '— ^(2) _ ^(2) ' 

which are the maximum likelihood estimates of pi and pq for a given subset S. 

• Calibration by permutation. We compare the observed value of the scan statistic to simulated 
values obtained by generating a random graph with either the same number of edges — which 
leads to a calibration very similar to the first option — or the same degree distribution — 
which is the basis for in the modularity function of Newman and Girvan (2004). 

We focus on the first option. 

Proposition 5. As.sume that liminfpQN'^ /n > 1. The scan test calibrated by estimation of pQ is 
asymptotically powerful for fixed expected total degree if 

nH{p^) 

2\og{N/n) ^ ' 

Hence, the scan test calibrated by estimation of po achieves the entropy condition (8) without 
requiring the knowledge of po or pi. We mention that adaptation to unknown n may be achieved 
as described in Section 3.5. 



4.4 Combined test and full adaptation to unknown 

A combination of the degree variance test and of the scan test calibrated by estimation of pQ is seen 
to achieve the detection boundary established in Theorem 3, without requiring knowledge of pQ or 
pi, or even n. 



5 Testing in polynomial- time 

While computing the total degree (11) or the degree variance statistic (21) can be done in linear time 
in the size of the network, i.e., in 0{N'^) time, computing the scan statistic (14) is combinatorial in 
nature and there is no known polynomial-time algorithm to compute it. To see this, note that the 
ability to compute (14) in polynomial-time implies the ability to compute the size of the largest 
clique in the graph, since this is equal to 

max{n : l^j^] = n^^^} , 

and computing the size of the largest clique in a general graph in known to be NP-hard (Karp, 
1972), and even hard to approximate (Zuckerman, 2006). 

A question of particular importance in modern times is determining the tradeoff between sta- 
tistical performance and computational complexity. At the most basic level, this boils down to 
answering the following question: What can be done in polynomial-time? 
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5.1 Convex relaxation scan test 

We now suggest a convex relaxation to the problem of computing the scan statistic. To do so, 
we follow the footsteps of Berthet and Rigollet (2012), who consider the problem of detecting 
a sparse principal component based on a sample from a multivariate Gaussian distribution in 
dimension N. Assuming the sparse component has at most n nonzero entries, they show that a 
near-optimal procedure is based on the largest eigenvalue of any n-by-n submatrix of the sample 
covariance matrix. Computing this statistic is NP-hard, so they resort to the convex relaxation of 
d'Aspremont et al. (2007), which they also study. We apply their procedure to W"^. 
Formally, for a positive semidefinite matrix B G M^^^ and 1 < n < A^, define 

Ar^(B) = maxA--(S5), 

\S\=n 

where Bs denotes the principal submatrix of B indexed by C {1, . . . , N} and X^^^{B) the largest 
eigenvalue of B. d'Aspremont et al. (2007) relaxed this to 

SDP„(S) = max Trace(BZ), subject to Z ^ 0, Trace(Z) = 1, \Z\i < n , 
z 

where the maximum is over positive semidefinite matrices Z = {Zgt) S M^^^ and \Z\i = ^ \Zst\- 
We consider the relaxed scan test, which rejects for large values of 

SDP„(W^2) ^ ^24) 

When pq is known, we simply calibrate the procedure by Monte Carlo simulations, effectively 
generating Wi, . . . , Wb i-i-d. from G{N,pq) and computing SDP„(VF^) for each b = 1, . . . , B, and 
estimating the p-value by the fraction of 6's such that SDP„(W^^) > SDPn(W^^). Typically S is a 
large number, and below we consider the asymptote where B = oo. 

When po is unknown, we estimate pq as we did for the scan test in Proposition 5, and then 
calibrate the statistic by Monte Carlo, effectively using a form of parametric bootstrap. 

In either case, we have the following. 

Proposition 6. Assume that (1) holds and n < A^^/^"* for some t > 0. Then, the relaxed scan 
test is powerful if 

liminf^^=^^l^>2. (25) 
^Alog(A) Po 

To gain some insights on the relative performance of the scan test and the relaxed scan test, let 
us assume that <C N, and npo » log(A/n). Applying Proposition 3 (or Proposition 5) in this 
setting, we find that the scan test is asymptotically powerful when 

{Pi -PQ? ^ \og{N/n) 
Po n 

Thus, comparing with (25), we lose a factor ^JN/\o^XN) when using the relaxed version. In the 
denser regime where ^ A^log(A), the total degree test and degree variance test both have 
stronger theoretical guarantees established in Proposition 2 and Proposition 4 respectively. Below 
we explain why the \/ N/ log(A^) loss is not unexpected. 
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Optimality 

The problem Hq : Q G{N, 1/2) versus : ^ ~ G{N, 1/2; 7i, 1) is called the Planted (or Hidden) 
Clique Problem (Feige and Ron, 2010) and has become one of the most emblematic statistical 
problems where computational constraints seem to substantially affect the difficulty of the problem. 
Recent advances in compressed sensing and matrix completion have shown that computationally 
tractable algorithms can achieve the absolute information bounds (up to constants) in most cases. In 
contrast, in the Planted (or Hidden) Clique Problem there is no known polynomial-time algorithm 
that can detect a clique of size n = o{^/N) (Dekel et al., 2011), while the clique test can detect a 
clique of size n x logA^, as shown in Proposition 1. In fact, the problem is provably hard in some 
computational models, such as monotone circuits (Feldman et al., 2012; Rossman, 2010). We refer 
to Berthet and Rigollet (2012) for a thorough discussion. 

More generally, we may want to characterize the sequences {n, N,pQ,pi) for which there are 
asymptotically powerful tests running in polynomial time. In our findings, the only situation where 
we found this to be true was in the dense regime, where the total degree test is both powerful in 
the large-sample limit and computable in polynomial time. (Replace this with the degree variance 
test when po is unknown.) 



5.2 Other polynomial-time tests 
5.2.1 The maximum degree test 

Perhaps the first computationally-feasible test that comes to mind in the sparse regime is the test 
based on the maximum degree 

max Wi. , (26) 

1=1,. ..,N 

where Wi. is the degree of node i in the graph, defined in (20). 

Proposition 7. The maximal degree test is asymptotically powerful if po ^ log(A^)/A^ and 

Nlog{N)po{l-po) 

Under condition (1), the maximal degree test is asymptotically powerless i/limsuplog(n)/log(A'") < 
1 and 

{Pi - Pof 



Aflog(iV)po(l-Po) 



(27) 



Comparing with Propositions 2 and 6, we observe that the maximum degree test is either less 
powerful than the relaxed scan test (when n < A^^/^"* for any t > 0) or less powerful that the 
total degree test (when n ^J N/\og{N)). For unknown the maximum degree test is also less 
powerful than the degree variance test. 



5.2.2 Densest subgraph test 

Another possible avenue for designing computationally tractable tests for the problem at hand lies 
in algorithms for finding dense subgraphs of a given size. We follow (Khuller and Saha, 2009), 
where the reader will find appropriate references and additional results. Define the density of a 
subgraph 5 C V as 



h{S) = where Es = {(i,i) G : Wij = 1} 
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Finding S C V that maximizes h{S) may be done in polynomial-time. 
Proposition 8. Assume that pQ S> log(A^)/A^. 

1. Under the null hypothesis, maxs h{S) ~Pq h{V) ~ Npq/2 and this maximum is achieved at 
subsets S satisfying \S\ ~ A'". 

2. The densest subgraph test is powerful i/liminf > 1. 

3. Assume that — )• 0. Under the alternative hypothesis, maxs h{S) ~Ps h{V) Npo/2 
and this maximum is achieved at subsets S satisfying \S\ ~ A^. 

The condition lim inf > 1 is stronger than what we have obtained for the relaxed scan 

test (25) in the sparser case (n < A^-*^/^^* for any t > 0) and than what we have obtained for 
the total degree test (13) and the degree variance test (22) in the less sparse case {n ^ ^/N). If 
npi/Npo — )• 0, then the densest subgraph statistic seems to behave like the total degree statistic 
and we therefore expect similar performances although we have no proof of this statement. 

In order to improve the power, we would like to restrict our attention to subgraphs of size n 
(assumed known for now) and use maxj^i^^ /i(S'). Computing this, however, is NP-hard, and there 
is no known polynomial-time approximation within a constant factor. Nevertheless, the following 
variant statistic max|5|>„ /i(S') can be approximated within a constant factor in polynomial-time. 
However, the power of the resulting test is not improved. Since the statistic max|5|>„ /i(S') may 
only be approximated within a constant factor, the resulting test is powerful only if npi > CNpo 
where C is positive constant that depends on this approximation factor. 

6 Discussion 

With this paper, we have established the fundamental statistical (information theoretic) difficulty 
of detecting a community in a network, modeled as the detection of an unusually dense subgraph 
within an Erdos-Renyi random graph, in the quasi-normal regime where npo is not too small as 
made explicit in (1). The quasi- Poisson regime, where npo is smaller, requires different arguments 
and the application of somewhat different tests, and this will be detailed in a separate paper under 
preparation. 

For the time being, in the quasi-normal regime, we learned the following. In the moderately 
sparse setting — n ^ N"^^^) for known po and n ^> N^^^ for unknown po — this detection boundary 
is achieved by polynomial-time tests. In the sparser setting, there is a large discrepancy between 
the information theoretic boundaries and performances of known polynomial tests, which in view 
of the Planted Clique Problem, is not surprising. 

It is of great interest to study this optimal detection boundary, this time under computational 
constraints, a theme of contemporary importance in statistics, machine learning and computer 
science. This promisingly rich hne of research is well beyond the scope of the present paper. 

7 Proofs 

7.1 Auxiliary results 

The following is Chernoff's bound for the binomial distribution. Remember the definition of H in 
(2). 
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Lemma 1 (Chernoff's bound). For any positive integer n, any q,po £ (0, 1), we have 

P {Bin{n,po) > qn) < exp {-nH{q)) . (28) 
A consequence of Chernoff's bound is Bernstein's inequality for the binomial distribution. 
Lemma 2 (Bernstein's inequality). For positive integer n, any po G (0, 1) and any x > 0, we have 

P [Bin{n,po) > npo + x] < exp 



2[npo(l-po) + x/3] 
We will need the following basic properties of the entropy function. 
Lemma 3. For po G (0, 1), H{q) is convex in q £ [0, 1]. Moreover, 



Hp{q) 



p(r log r 
glog(f)+0(<?) 



r + l) 



1; 



r G (1, oo), p — ^ 0; 



(29) 



oo. 



We will also use the following upper bound on the binomial coefficients. 
Lemma 4. For any integers 1 < k < n, 



k log{n/k) < log 



< klog{ne/k), 



(30) 



where e = exp(l). 



The next result bounds the hyper geometric distribution with the corresponding binomial dis- 
tribution. Let Hyp(iV, m, n) denotes the hyper geometric distribution counting the number of red 
balls in n draws from an urn containing m red balls out of N . 

Lemma 5. Hyp(A^, m, n) is stochastically smaller than Bin(n, ?7i/(A^ — m)). 

Proof. Suppose the balls are picked one by one without replacement. At each stage, the probability 
of selecting a red ball is smaller than m/{N — m). The result follows. □ 

7.2 Proof of Theorem 1 

Following standard lines, we start by reducing the composite alternative to a simple alternative by 
considering the uniform prior vr on subsets S C [N] := {1, . . . ,N} of size l^l = n. The resulting 
likelihood ratio is 

#{5C [iV] : \S\ = n,Ws = n(^^ 



L 



N\n{n-l)/2 



'N\ n 



(31) 



which is the observed number of cliques of size n divided by the expected number under the null. 

The risk of any test for the original problem is well-known to be bounded from below by the 
risk of the likelihood ratio test {L > 1} for this 'averaged' problem, which is equal to 

7L :=Po(i^ > l) + Eo(L{L < 1}). 

Therefore, it suffices to show that — ^ 1- Here we use arguably the simplest method, a second 
moment argument, which is based on the fact that 



7L = 1 - Eo |L - 1| > 1 - \/Varo(L), 
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by the Cauchy-Schwarz inequality, so that it is enough to prove that Varo(L) — )• 0. We do so by 
showing that Eo(L2) < 1 + o(l). 
Note that 

Ws = n(2) 



where 7r[-] denotes the expectation with respect to vr. Hence, by Fubini's theorem, we have 



En = vr^ 



2n(2) 



vr 



ft:(K-i)/2' 



where K := jS*! n . Indeed, the event {Ws^ = = ^^^^^} means that all edges between pairs of 
nodes in Si exist, and similarly for ^2, and there are a total of n(n — 1) + K{K — l)/2 such edges. 
Before going further, note that (3) and (30) imply that 



in — 1) 

\og{N/n) - ^ ^ ^ log(l/po) ^ oo. 



In particular, this means that n < 3 log N , eventually, and therefore 

.2 



n 
iV 



0((logA^)7A^) ^ 0. 



(32) 



(33) 



Since K ~ Hyp(A^, n,n), by Lemma 5, is stochastically bounded by Bin(n,p), where p :- 
n/{N — n). Hence, with and Lemma 1, we have 



¥{K >k)< P(Hyp(iV,n,n) > k) 

< ¥{Bm{n,p)>k) 

< exp {—nHp{k/n)) . 

Now, using Lemma 3 and (33), for k > 2 we get 

nHp{k/n) = k\og{k/{np)) + 0{k) = k\og{kN/n^) + 0{k). 

Hence, 



(34) 



K{K-l)/2 



)(K< l) + ^exp 



fc=2 



k{k-l) 



< 1 + ^ exp ( A; 



fc=2 



{k - 1) 



log(l/po) - nHp{k/n\ 



log(l/po) - \og{kN/n'') + 0(1) 



(35) 



For a > fixed, the function x — > ax — log x is decreasing on (0, 1/a) and increasing on (1/a, oo). 
Therefore, 



1) 



log(l/po) - \og{kN/n^) < -oj, 



1 n — \ 

uo := min ( log(A^/7i^) — -log(l/po), log(A^/n) — 



log(l/po) • 



where 

By (32), the second term in the maximum tends to oo. This also the case of the first term, since 

1 



log (iV/n^ 



log(l/po) = log(iV/n) 



Tl — 1 71 

log(l/^)o) + 2 log(lM) - log n. 
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with the second difference bounded from below. Hence, u — )■ oo. Hence, the sum in (35) is bounded 
by 

5^exp(M^ + 0(l)])<^— — ^^0, 

k=2 

eventually. 

Hence we showed that Eo(L^) < 1 + o(l) and the proof of Theorem 1 is complete. 
7.3 Proof of Theorem 2 

We assume that (1), (7) and (8) hold. We reduce the composite alternative to a simple alternative 
by considering the uniform prior vr on subsets S C [N] :={!,..., N} of size \S\ = n. The resulting 
likelihood ratio is ^ 

^(^) = (^) E Ls{A) = vr[Ls(A)], (36) 

^ ^ \S\=n 

where 7r[-] is the expectation with respect to 5 ~ vr, ^ = (Wjj- '■ ^ < i < j < N) and 

Ls := e^v{GWs - A(0)n(2)), (37) 

with 
and 

K{e) :=log(l-po+Poe'), 

which is the moment generating function of Bern(]3o)- 

Still leaving pQ implicit, let Hp^^{q) be short for H{q). It is well-known that H is the Fenchel- 
Legendre transform of A; more specifically, for q G (po, 1), 

H{q) = snp[q9 - A(0)] = qOg - A(^,). (39) 
6»>0 

The second moment argument used in Section 7.2 is also applicable here, though it does not yield 
sharp bounds. In Case 1 below (see Subsection 7.3.3), which is the regime where the moderate 
deviations of the binomial come into play, this method leads to a requirement that the limit superior 
in (8) be bounded by 1/2 instead of 1. And, worse than that, in Case 3 below, which is the regime 
where the large deviations of the binomial are involved, it does not provide any useful bound 
whatsoever. 

Fortunately, a finer approach was suggested by Ingstcr (1997). The refinement is based on 
bounding the first and second moments of a truncated likelihood ratio. Here we follow Butucea and Ingster 
(2011). They work with the following truncated likelihood 

^ ^ \S\=n 

where the events Ts will be specified below. We note T = n|s|=n-'^s- Using the triangle inequality, 
the fact that L < L and the Cauchy-Schwarz inequality, we have the following upper bound: 

Eo|L-l| < Eo |L - 1| +Eo(L - L) 



< jEo[L^] - 1 + 2(1 - Eo[L]) + (1 - Eo[L]) , 
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so that 7l — )• 1 when Eo[L^] — )• 1 and IEo[I/] — )■ 1. Note that contrary to Butucea and Ingster (2011), 
we do not require that Po(r) — t- 1. More precisely, we shall prove that (1, 1) is an accumulation 
point of any subsequence of (Eq I/, Eo[I/^]). Adopting this approach allows us to assume that pi/pQ 
converges to r E [l,oo], pf/po converges to r2 G [0,oo] and that 



nH(pi) 



< 1 



2log{N/n) 

for some rjQ £ (0, 1) fixed. Notice that (5) and (8) imply that H{pi) — )■ 0, which by Lemma 3 forces 
either pi/po — )• 1 or pi — )■ 0; in any case, pi is bounded away from 1 this time. 

In what follows, we provide the general arguments while the proof of the technical results 
(Lemmas 6-8) is postponed to the end of the section. To show these technical results, we divide 
the analysis depending on the behaviour of pi/po 

(41) 
(42) 
(43) 



Pi 
Pq 



r = 1, 

r G (1, oo) 

r = oo. 



In regime (41), the moderate deviations of the binomial distribution dominate and these are asymp- 
totically equivalent to normal (Gaussian) deviations; in particular, it is in this setting (with po 
constant) that Butucea and Ingster (2011) successfully reduce the binary setting to the normal 
setting. In regime (43), the large deviations of the binomial distribution dominate, which are not 
alike the normal deviations and lead to a completely different regime. Regime (42) is intermediary 
and requires special treatment. 

First, we need some notations to introduce Tg- Define the numbers 



1 + 2- 



log(iV/n) 



log 1 + 



po(l-po) ) . 



A n , 



1 + 2 



log (^) - log \ 


log 




Alog(iV/n)| 


(log(JV/n)) 


log 


\ po(l-po) ) 





A n 



(44) 



(45) 



The exact expression of /cmin will be useful for bounding the second moment of L. For the time 
being, we only need to have in mind the properties summarized in the following lemma. 

Lemma 6. We have /cmm — ^ co, fcmin ~ A;*, and log(n/A;mm) = o \[og{N /n)]. 

We define F^ as follows 



Ts := Pi {Wt < Wk, VT C S such that |r| = k} , 

k= [/i^minj +1 

where Wk '■= qkk^'^\ with 



(46) 



1 



H{qk) = logiN/k) + 2 . 



(47) 

This construction is possible by the following lemma, which serves as a definition. 

Lemma 7. For any integer k between /cmin + 1 CLnd n, there exists a unique G (po; 1) such that 

^±-^H{qk)=log{N/k)+2 . 



Moreover, qk satisfies 6q^ < 26. 
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7.3.1 First truncated moment 

We first prove that Eq L — > 1. By Fubini's theorem, we have 

EoL = TT[Eo[Lslrs]] =7T[Fs{rs)] = rs{Ts), 

where S is any fixed subset of size n in {1, ... , N} and this last inequality is by the fact that P5(r5) 
does not depend on S by symmetry. By the union bound, Chernoff's bound (28) and (30), 

n 

fc=LfcminJ+l TCS,\T\=k 



< Qp(Bin(fc(2),p,)>^,A;(2)) 



k{\og{ne/k)-^-l^-^HpM 



n 

- Y 

We then conclude that 1 — P5(r5') = o(l) using the following result. 
Lemma 8. We have 



, — r- , — 7r--f^Pi(9fe) - log (t 
fc=LfcminJ+i,.-,n V 2 \k 



min (^^Hp^iqk) -log[-)] ^ (X, . (48) 



7.3.2 Second truncated moment 

We now prove that Eq < 1 + o(l), which with Eg L — )• 1 shows that Varo(-£) — )• 0. Let 5*1, ^2 ~ vr 
and define K = l^i n S'2|. By Fubini's theorem, we have 

KoP = Es,,s2^o[Ls,Ls,lrs,'i^rs, 

= 7r®2 [jE^ (^g^p (^^(^^^ +Ws,)- 2A(0)?i(2)^ trs.nrs^ 

Define 

WsxT = l E 
and note that Ws = WsxS- We use the decomposition 

Ws, + Ws, = Ws, X (5AS2) + Ws, X + 2W^Sin52 , (49) 

the fact that 

Ts, n Ts, C {Ws,nS2 < wk} , 
and the independence of the random variables on the RHS of (49), to get 

Eo (exp (d{Ws, + Ws,)- 2A(0)n(2)) Ir^^nrs,) < I • II • HI , 

where 

I:=Eoexp (^eWs^y^^sAS^) ' ^(^^ ' K){n + K - 1)^ =1 , 
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II:=Eoexp^VF52x(52\5i) 



A{9) 



{n-K){n + K-V 



1 , 



III:=Eo exp 20T^5in52-2A(0 



1 



The first two equalities are due to the fact that the likelihood integrates to one. 

To bound III, we follow Butucea and Ingster (2011), with a twist. When K < k^in, we will use 
the obvious bound: 



where 



III < Eoexp 20WsinS2-2A 



A := A{29) - 2A{9) = log 1 + 



exp 



(Pi - Po) 



Po(l -Po 

When K > k^m, we use a different bound. For any ^ € (0, 29), we have 

exp [^Ws,nS2 + (2^ " O^k - 2K{9)K^^^) {Ws.ns, < wk} 
^Ws.ns, + {29 - i)wK - 2A(0)i^(2) 



(50) 



III < Eo 

< Eq exp 



so that 



where 



III < exp f Aa-A'(2) 



Afc := min A(e) + {29 - ^Qk - 2A{9) . 

?6 [0,261] 



(51) 



By the variational definition of the entropy (39), the minimum of A(^) + {29 — ^)qk — 2A{9) over ^ 
in M"^ is achieved at ^ = 0^^, and we know from Lemma 7 that 0^^, < 29. Hence, we have 



-H{qk) + 29qk - 2A{9) 
-2Hp,{qk)+H{qk) , 



(52) 



Following our tracks, we have 



Eo < E 



+ E 



where the expectation is with respect to vr®^. 

Let b be an integer sequence such that 6 — )• co so slowly that 



{Pi -Po) bn^ 



0, 



(53) 



which is possible because of (7). Recall that p = n/{N — n) and define /cq = \hnp\. We divide the 
expectation into two parts: K < ko and ko -\- 1 < K < n. When ko = 1, we simply have 



l{K<ko} exp (ak(2)) j = F{K < 1) < 1 . 



E 



When ko > 2, we use the expression (50) of A to derive 



E 



l{/<<fco}exp(AK(2))j < exp[A/c, 



< 



exp 



0(1) 



{Pi - Po 



|2 b^n^ 



Po(l-Po) 



1 + 0(1) 
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because of (53). 

When ko + 1 < K < [/cminj , we use the bound (34) and the identity (1 — x) log(l — x) > —x, to 

get 



E 



L^min J 

fc=fco + l 
L^min J 

A;=feo+l 



^A:(fc-1) /A; 
^ 2 ~ " 



n 



For a > fixed, the function /(x) = ax — logx is decreasing on (0, 1/a) and increasing on (l/a,oo). 
Therefore, for ko + 1 < k < n, 

k-1 , / k\ 

- - log — <-uj, 
\npj 



A- 



where 



uj := mm 



log6- A- 



kn-1 



-, log 



krt 



np 



A 



k ■ — ^ 



From what we did previously, we know that A{kQ — 1) = o(l), so that the first term in the maximum 
tends to oo. Therefore, it suffices to look at the second term in the maximum. In fact, ^min has 
been precisely defined in (45) to make this second term diverge. Indeed, by (45) and (50), we have 

- log log 



kn 



A-'^mm 1 , , / ^k 
< log 



n 



log{N/n) 



By Lemma 6 and since p x n/N = o(l), we get log{krain/{np)) — log {^^) = o{l). Consequently, 

kn 



bi -^111111 \ A ^min 1 
g 1 - A- 

np 



> log log 



n 



log{N/n) 



+ o(l) oo 



because of (5). 

When K > kmm, we have 



E 



- ^ ^^P 



, , . k-1 / , 

k { Afc^- - log f — 1 + 1 



Now, using (52), we have 



^ k-1 ^ { k 
Afc^— - log — 
2 \np 



k-1 



2Hp,{qk) + H{qk)] - log - + 2 log - + o(l 



N 



k 



n 



which goes to — oo uniformly over all k between [fcminj + 1 and n by the definition (47) of and 
by the control of Hp^{qk) from Lemma 8. Hence, the sum above tends to zero. 
This concludes the proof that Eq < 1 + o(l). 



7.3.3 Proof of Lemma 6 

We only need to prove that A;* — t- oo and that log(n//c*) = o [log(A^/n)] since 



log { log 



n 



\og{N/n) 



A \og{N/n) \ = o{\og{N/n)) . 
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We divide the analysis into three cases depending on the behaviour of pi/po- 
CASE 1: pi/po 1. Then, Lemma 3 tells us log (l + ^jj^l^) ~ 2ff(^Ji), so that 

, ^ log(Af/n) 
H{pi) 

since H{pi) < 2(1 - 770) log(iV/n)/n by (40). Hence k* ^ 00 and log(n/A;*) = 0(1). 

CASE 2: pi/po — )■ r with r G (1, 00). Since H{pi) goes to 0, this enforces po — )• 0. Using Lemma 3 
and (40), we derive that 

Po log(r) — r + 1] -< log(A^/n)/n . 
Hence, log(A^/n)/po >~ n. Going back to the definition of k^, we derive that 

log{N/n) 



k* y 



An >~ n 



Po{r - ly 

CASE 3: pi/po — ^ 00. Again, we have po — )■ 0. By Lemma 3 and (40), 



Hence, 



pi\ log iV/n 

pi log — -< . (54) 

\PoJ n 



log ( ^ ) ^ log [log(iV/n)/(npo)] = o[log(7V/n)], 



where the last part comes from (1). Hence, 

log(7V/n^ 

k y - — - — — 00 . 

log{pi/po) 

Since (54) also implies that pi -< log{N/n)/n, we have 

n_ ^ nlogjl + pI/po) ^ ^ ^ npl ^ ^ ^ log{N/n) 
k* log{N/n) log{N/n)pQ npo 

so that log{n/k*) < log [log{N/n)/{npo)] V + 0(1) = o[log{N/n)] by (1) 

7.3.4 Proof of Lemma 7 

Define q by the equation 

q _ pI{1 -po) 



V 1 



1-q po{l -Pi^ 



(55) 



which implies Og = 29. Because H is strictly increasing and continuous on (po)^)) to prove the 
existence of qj. it suffices to show that 

^'"'°~^ g(g)>log(iVAw) + 2. 

As in the proof of the previous lemma, we consider different cases depending on the convergence 
oipi/pQ and oip\/pQ. hi all cases, except the last one, we show that 

k,H{q) > 2(l + e)log(iV/n), 
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for some fixed e > 0, which suffices by Lemma 6. If fc* < n, so that > ^log{N/n) (with A 
defined in (50)). li k* = n and fcmin < n, we have k^, > log{N/n){l + o(l)). Hence, it is enough 
the prove that 

H{q) > (1 + e)A, for some fixed e > 0. 

The last case, Case 3(c) below — which corresponds to po = o{log{N / n) / n) and log(n) = 
o(log(A'^)) — requires a more delicate treatment. 

CASE 1: pi/po 1. By the definition of g, we have q-po = {pi-po) 1 + pp^2p,^^+p^ ~ '^{Pi-Po) 
and Lemma 3 tells us that 

Po(l-Po) - 

CASE 2: Pi/pq — s- r with r G (l,oo). Note that this forces pi — t- 0. Here (55) implies that q/pQ ~ 
(pi/po)'^, so that H{q) ~ po (^r'^ log{r'^) — + l) by Lemma 3. At the same time, A ~ pQ^r — 1)^, 
so that 

H{q) log(r2) — + 1 2r (r log(r) — r + 1) 



1+ ' > 1 . 



A (r-l)2 (r-1) 

CASE 3(a): pi/po — )• oo and Pi/po — )• 0. We have q/po ~ {pi/pof' oo, implying that H{q) ~ 

2 

?log(g/po) ~ 2(pf/po) log(pi/po) by Lemma 3. Also, A ~ log(l +pf/po) ~ ^- Hence, H{q} > A. 

CASE 3(b): pi/po — )• oo and Pi/po r2 £ {0,oo). Here q ^ 1/(1 + r2), so that q/po — )■ oo, 
implying that H{(j) ~ q^og{q/po) x log(l/po) — ^ oo. Also, A — )• log(l + r2). Hence, -ff(g) ^ A. 

CASE 3(c): pf/po — oo. By Definition (44) of /c*, this implies k^ < n. By definition of q, we 
have q = 1 — o(l), so that H{q) ~ log(l/po)- On the other hand, A ~ log(pi/po)- Therefore, 

H{q) log(l/po) _ 1 



A log(p2/po) ^ _ log(p|) ' 

log(Po) 

so that we are done if log(pi)/log(po) is bounded away from 0. When log(j>i)/log(po) = o{l), we 
need to work a little harder and perform a second order analysis. From the definition of q, we 
derive 1 — q < 22- so that 

1 _ PO PO 



H{q) > Hil -^2i) = ^i_q) iog( ^) + ^ log(-^) = (1 - ^) log(-) + o(l). 

Pi Pi Po Pi 1 - Po Pi Po 



Hence, 



A 



H{q) ^ ^ log(l/p2)-|log(l/po)-o(l) 

log (I) +0(1) 

. , N / 1 PO log(Po) I \ 

21og(lM) [ ^~^?ki(^+"^^^ 
l°g(l/Po) Vl-^ + o(l) 
log(l/pi) 



> 



(2 + o(l)) 



log(l/po) 
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since pl/po — ^ oo. We use this lower bound to get 



^min 1 



H{q) > [log(iV/fc,)-21og(n//e,)-loglog(n/log(Af/n))] 



A 



> 



[log(iV/A:^in) + 2] 



l + (2 + o(l)) 



1 



2 + o(l) + 21og(n/A;*) + loglog(n/log(iV/n)) 



log(iV/n) 



log(l/pi) 
log(l/po) 



where we used Lemma 6 in the second inequahty. In order to conclude, because of (5), it suffices 
to show that 

\og{n/h) + loglog(n/log(iV/n)) log(l/pi) 



log(A^/n) 



log(l/po) 



(56) 



The bound (54), coupled with pi » ^/po, implies that 21oglog(A^/n) — log(n) + log(l/(npo)) oo. 
This, together with (1), forces log(7T,) = o [log(A^/n)]. Hence, 



log(l/po) _ log(n) + log(l/(npo)) 



log (iV/n) 



log{N/n) 



It remains to show that 



log(n/A;*) + loglog(n/log(A^/n)) 



o(l) 



0(1). 



log(l/pi) 
By definition of A;* 

log(n/A;*) < log(n/log(iV/n)) + log(A) < log(n/ log(7V/n)) + loglog(p^/po), 
so that, because of (5) and (54), we have 
log(n/fc*) + log log(n/ \og{N/n)) ^ log(n/ log(A^/n)) + log log(pf /po) + log log(n/ log(A^/n)) 



log(l/pi 



log(n/log(iV/n)) + loglog(pi/po 



0(1). 



7.3.5 Proof of Lemma 8 

We first note that, by the entropy bound (40) involving pi, the definition of qk Lemma 7, definition 
of q in (55), and the fact that H{q) is strictly increasing over q > po, we have 



Pi <qk<q, <n . 



(57) 



CASE 1: pi/po — 5- 1. In the proof of Lemma 7 (Case 1), we have shown that q defined in (55) 
satisfies q ^ pq. By (57), we then get qk ^ Po ^ Pi- Then using Lemma 3 and the bound on the 
entropy (40), we get 



{qk-PoY H{qk 



> 



n 



> 



1 



{Pi-PoY H{pi) {l-rio)k l-r?o 
Hence, we may lower bound Hp-^{qk)Qs follows: 



(58) 



Hp^ {qk 



(qk-pi) 



{qk - pof 



2pi(l-pi) 2po(l-Po) 



qk -po 
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which allows us to conclude that 



-j^H{qk) y log(iV/n) » log (^J V 1 



where the last inequality follows from Lemma 6 and the fact that k > /cmin- 

CASE 2: pi/po — ?• r G (l,oo). As in the proof of Lemma 7 (Case 2), we have pi — )• 0. In the 
proof of Lemma 7 (Case 1), we have shown that q/po — r'^ and that g — t- 0. By (57), we can use 
the second asymptotic expression of the entropies in Lemma 3. The inequalities in (58) still hold, 
giving 



1 - r?o ~ H{pi) 



< 



£iL log m 



PO 



•log(r) — r + 1 



f{qk/Po) 
fir) 



(59) 



where f{x) := xlog(x) — x + Since / is convex and satisfies f'{x) = log(x), we have f[x) — f{r) < 
{x — r) log(x) for x > r > 1. Taking x = Qk/po and using (59), we derive that 



eventually. As a consequence, qk/po is also lower bounded away from r. Thus, iog{qk/pi)/ log{qk/po) 
is bounded away from by a constant that only depends on r and r/o- We then derive. 



log(^ P-lUlog^ 

yPlJ \Pi J \PoJ \P0 



(60) 



Now, for the entropy Hp-^{qk), by Lemma 3 we have 



(Qk -Pif 



Pi 



A qk log 



Pi 



^-iVA^logf^ 



.Pi / Pi \Pi 
as log(l + x) < x. Since H{pi) ~ Pofir), we get by (59) and (60) 

rH{pi 



> pi ( — - 1 ) log ( — 

Pi ) \Pi 



Hp,{qk) 



>- 



f{r) \Pi 
H{qk 



^-ihogf* 
Pi 

Po 



— - r 1 log ( — 



fiQk/po) \Po 
y H{qk) 

y ^ \og{N/n) , 

where the third line follows from the fact that the qk/po is lower bounded away from r and that 
f{x) ~ xlog(x) when x — )■ oo. Thus, 



k-1 



HpM y log(iV/n) > log(n/fc) V 1 , 



as before. 
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CASE 3: pi/po — )• oo. As in the proof of Lemma 7 (Case 2), we have pi — )■ 0. We start as in the 
two previous cases, again using Lemma 3 to get the asymptotic expressions of the entropies. By 
(57), qt/pQ > pi/po oo, so that 



n 



H{qk) Qk log(g/c/Po) qk 

, rs^ rv^ 

{1 - rio){k - 1) - H{pi) Pilog(pi/po) Pi 



< 



1 + 



iog(gfcM 

log(pi/po 



(61) 



It follows that 1^(1 + iog(pi/po) ) > (1 - %) ^- Since log(pi/po) oo, we derive that q^/pi > 



(1 — r/o/2) ^ for n large enough. Since pi < qu < q, we have qk/pi < Q'/pi < Pi/po- It follows that 
log(gfc/pi)/log(pi/po) < li and therefore qk/pi > (1 + o(l))^ by (61). We conclude that 



qk ^ 
Pi ~ 



n 



V 



1 



2k 1 - 7?o/2 



(62) 



Turning to the entropy Hp^{qk), we have Hp-^{qk) > qk^og{qk/pi) - qk + {I - qk)Pi- Using 
Lemma 7 and Lemma 3, we get 



k-l 



Hpoiqk) ^°^\ k 



N 



log 



^ log _ (1 + 0(1)) 

log ( 2fe ^ V n ' 



Po 



We explaing above that q^/pi < q/pi < Pi/po, so that q^/po < [pi/pof, implying log(gfc/po) < 
21og(pi/po)- Applying (62), we get 

k - ^ -TT / N log(A/n) , I-,. ,1 
-^Hp,{qk) y 7 \ log n//c V 1 , 
2 log(pi/po) 

We saw in the proof of Lemma 6 (Case 3) that log(pi/po) = o[log{N/n)], so we conclude that 

k-l 



^Pi(%) >log(n/A;) VI 



7.4 Proof of Theorem 3 

We start with a couple of lemmas. 

Lemma 9. Under conditions (17), (18) and (19), we have 

(Pi - Pof 



nHp^{pi) 
hm sup — — , , ^ , , < 1 
21og(A/n) 



Pq 



A3/2 



. 



(63) 



As in the proof of Theorem 2, for n large enough, we may assume that there exists ryo > such 



that 



21og(iV/n) ^° 



(64) 



Lemma 10. Under conditions (17) and (64), we have 

{pi - pof 



N Po 



0(1) . 
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We consider the likelihood ratio under the uniform prior: 

-1 



^ ^ \S\=n 



and 



s 



exp 



,Ws - A(0pjn(2) + e„,(W - Ws) - (iV(2) - n(2))A(0p, )■ 



(65) 



(66) 



As in the proof of Theorem 2, we use a thresholded version of L' to prove that Eo[|L' — 1|] = o(l): 



L 



^ \S\=n 



where is defined in (46). As in the proof of Theorem 2, we prove that any subsequence of Eo[L— 1] 
has as an accumulation point. This allows us to assume that pi/po converges to r G [l,oo] and 
that pI/po converges to r2 S [0, oo]. To control Kq[L — 1], it suffices to prove that K^L = 1 + o(l) 
and that Eo[L'^] < 1 + o(l). 

First moment 

EoL = 7r[Eo[L'slrJ] =vr[lP'5(rs)] =P'5(rs) . 
As the proof of Theorem 2, we can show that P'5'(r5) = 1 + o(l) relying only on (19). 

Second Moment. It remains to prove that Eo[Z2] < 1 + o(l). Let 81,82 ~ vr and define 
K = \8in 82]. Observe that {Ws,ns„ Ws, + Ws, - 2Ws,nS2, W - Ws, - Ws, + Ws.ns,) are 
independent. Arguing as in the proof of Theorem 2, we decompose the square of the modified 
likelihood as follows. 

EoP = 7:^^[Eo{L's,L's,trstrs,)] 
< 7r®2 p . jj . jjjj 

where 
I 

II 
III 



Eq exp 
Eq exp 



26p,^{W - Ws, - Ws, + Ws,ns,) - 2A(^pj,) [N^^^ - 2n(2) + ^(2) 

{Ws, + Ws, - 2Ws,ns,) - 2 (a(^pJ + A(epj,)) (n^^) - K^^)) 



exp (2ep,Ws,ns, - 2A{9p,)K'^^A {Ws,ns, < wk} 



En 



All these expectations only depend on Si and 5*2 through K. 

The term III already appeared in the proof of Theorem 2, where we saw that III < exp 
for K < k^in, and that III < exp (A^i^*-^^) for K > Aimin where /cmin is defined in (45), while A 
and Ajt are defined in (50) and (51), respectively. 

Since the expectations inside I and II are not thresholded, we easily compute these terms: 



I = exp [(iV(2) - 2n(2) + K^^)) (A(20p, ) - 2A(e^, )) 



with 



A(20p.)-2A(e, 



J ^2 



logll + M^U (^'i-^'o) 



n 



(2) 



j5o(l-po)y po{l - po) \N'y'^) 
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and 
with 

Since (pi - p'q) -- 
I • II < exp 



II = exp 



(Po -Po)(Pi -Po) 
Po(l -Po) 



< 



(Pi -Po)(^'i -Po) ^^^^ 



Po(l -Po) 



iV(2) • 



(pi — ?'o)(l — "-''^V^*'^'*) ^, we derive 



(Pi -Po) 
Po(l -Po) 



+ 



V 



Ar(2) iv(2) 



i^(2) 



(n(2))2 



n(2) 



(67) 



By Lemma 10, An'^/N — >■ and by (63), An^/N'^^"^ — )■ 0. Hence, there exists 6 — )• oo such that 
A^62 ^ and ^ 0. Define A;^ = + and ko = [b^\. We can take b small 

enough to constrain kQ < n/2. 

To prove that Eg < 1 + o(l), we only need to show the four following results 



E 
E 



{K < A;[,}exp|AK(2)| 
{k'o<K < ko} exp |Ail'(2)| Vk 
{fco < A'< A:„,iJexp{AK(2)|y^ 

{A:min <K<n} exp [AkK^^^} Vk 



< l + o(l) 

= o{l), 

= o{l). 

= o{l). 



(69) 
(70) 
(71) 



By Lemma 10 and the definition (67) of Vk, we have log(Vfc) = o{k'^/N) = o{k) when k < n. 
As a consequence, the expectations in (70) and (71) are almost the same as the expectations 
E [{A;o < K < A;min}exp{A/f(2)}] and E [{/cminA' < n} exp { Ai^K(2) }] that we bounded in the 
proof of Theorem 2. This is made rigorous to establish the following result. 

Lemma 11. Under the entropy condition (64), the bounds (70) and (71) hold. 

In fact the main difference between the proof of Theorem 2 and the current proof lies in the 
control of the two expectations in (68) and (69). Here, we need to carefully upper bound Vk in 
order to balance AK^'^\ Using the identity log(l + x) < x, the property log(Vfc) < for k < n/2 
— easily verified from the definition (67) — and ko < n/2, we get 



Vk < exp 



A 



V 



(n(2))2 n 
+ 



(2) 



iV(2) iV(2) 



(n(2))2 

iV(2) 



iV(2) 



n(2) 

Ar(2) 



for k < ko- In the sequel, we note 
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so that 2A' ~ A. Thus, we get for any k < ko, 



exp 



{Afc(2)|y, < exp|2A' 



(n(2))_2 

iV(2) 



< exp{A'(.2_^)+2A'^ 



< (l + o(l))exp|A' 



n 



(72) 



since A'nViV^ ^ An^/N^ < ^i^^M^-^^ = o{n/N) = o(l) by Lemma 10. 
Using this upper bound (72), we consider the expectation in (68) 

,4 



E 



{K < A;o}exp|Ai<:(2)} 



-< exp 
-< exp 
< exp 



A' f //2 ^ 



, bn ,211? bn . 

Ari/2 \lv' ^ Ann) 



1 + 0(1) 



iVV2 V AT ivi/2 

since A'%^ ^ A%^ = o(l) and A'^ <C A|^ = o(l) by definition of b. We have proved (68). 
To prove (69), we apply the Cauchy-Schwarz inequality and we upper bound K hy k^ < bn'^/N, 



E 



< E 



{k',<K< ko} exp I A'(6 + 1)^ (^K - ^) I 
< Fy\K>k'o) EV2[exp|2A'(6 + l)^(i^-^ 



Recall that K ~ Hyp(A^, n,n), so that EK = and Var(Er) < ^. Hence, by Chebyshev's 
inequality, F{K > k'^) < l/b"^ 0. 

We know from (Aldous, 1985, p. 173) that K has the same distribution as the random variable 
E(M^|;Sp) where W is binomial random variable of parameters n, n/N and Bn some suitable a- 
algebra. By a convexity argument, we apply this to get 



n 



Eexp<{ 2A'(6 + 1)— ( K 



n 
iV 



< 



< 



-2A'(fe+l)^ 



exp 



rr 



ATS' 



+ 1)2a'2 



< 1 + 0(1) , 
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since A'6(^ V -^^) = o(l) by definition of b. All in all, we have proved (69). 



7.4.1 Proof of Lemma 9 

The second convergence is a straightforward consequence of the definition of poi (18) and (19), so 
that we focus on the first result. Let us compute the difference between the two entropies Hp'^{pi) 
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and Hp^{pi). 



Hp'Api) - Hp^iPi] 



< 



< 



Pi log 



+ (1 - pi) log 



Pi 



Pi -Po 
P'o 



n(2) (pi-p'o)' 



1 -Po 



iV{2)p'o(l-p'o) 

Arguing as in the proof of Lemma 10, we note that, under conditions (17) and (64), 

Np'oil-P'o) 

so that Hp>^{pi) - Hp^pi) = o{l/N) = o (log(iV/n)/n), since n<N. 

7.4.2 Proof of Lemma 10 
CASE 1: pi/po 1- By condition (64), 

{Pi - Po? 



n 



N po{l -po] 
CASE 2: pi/po c G (l,oo). Similarly, 

n {pi - Po)^ 



2H{p,)-^log - - = o{l 



N 



N\ n 



n N 



AT ft ^ Po c-1) — 



n 



CASE 3: pi/po — > oo. We have 



-< H{pi)— -< log 



(pi ~ Po? Pi 
Npo{l-po) Po N 



N\ n 
) N 



0(1) . 



By condition (64) and Pi log(pi/j>o) H{pi) -< Mog(A^/n). Dividing this inequality by po and 
then taking the logarithm leads to log(j)i/po) ^ loglog(A^/n) + log(l/npo) = o(log(A^/n)) by (17). 
It follows that pi/po = o{^J N/n) and pi = o(log(A^/n)/n). All in all, we conclude that 



2 9 

PI 



log(A^/n) 



n 




N 



0(1) . 



7.4.3 Proof of Lemma 11 

Let us first consider (71). Using the upper bound log(Vfc) = o{k), we only have to prove that 



E 



{K>k^ir.}exp(AKK^^^ + o{K 



o(l) 



We have shown in the proof of Theorem 2 (only using the entropy condition) that 



E 



k Au. 



k - 1 



log ( — ) + 1 
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tends to zero since all the terms log ( ^ ) + 1 simultaneously go to — oo for k = [kmml + 

1, . . . , n. Consequently, 



E 



k= [/i^minj H~l 



MA,^-log(A)+i + ,(i) 



also tends to zero. 



Let us turn to (70) following again the same arguments as in the proof of Theorem 2. 
E [{A:o <K< LA^minJ} exp |ak(2) + o{K)^ 



L^minJ 

k=ko + l 
L^minJ 

< ^ exp 



AA;(2) + o(A:) -nHpl- 



n 



A;^|^(fc-l) + o(l)-log(A)+i 



Hence, as in the previous proof, we only need to prove that 



uj := mm 



log6- AA:o/2, log 



np 



AA;niin/2 



goes to oo. By definition of /cq, we have A/cq = o(l), while we showed in the previous proof that 
log ( ) — A/cmin — ^ oo. With this, we conclude. 



7.5 Proof of Proposition 2 

We start with a useful result for proving that a test is asymptotically powerful based on the first 
two moments of the corresponding test statistic. 



Lemma 12. Suppose that for testing Hq versus Hi, a statistic T satisfies 

Ei(r)-Eo(r) 



Rrp ■= 

max (v^Vari(r), VVaro(r)) 
Then there is a test based on T that is asymptotically powerful. 



oo. 



(73) 



Proof. Consider the test that rejects when T > Eo(r) + -s/ Rt Varo (T) . By Chebyshev's inequality, 
the probability of type I error tends to zero: 

Po(r > Eo(r) + Vi?rVaro(r)) < ^ ^ 0. 

For the probability of type II error, we have 

Pi(r > Eo(T) + Vi?rVaro(r)) = Pi f ^^== > "TJ > 1 " 



where 



7 := 



i^T max (VVari(r), /Va^^) - ^fiy Varo(T) 
\/Vari(r) 



oo. 



□ 
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We now apply Lemma 12 to the total degree test. From (12), under the null, 
Eo{W) = ^^^~^\ o, Varo(T^) = ^^^~^^ po(l-Po), 



while under the alternative, 



and 



, , iV(iV-l) n(n-l). 



, , iV(iV-l) , , n(n-l), , 
Vari(T^) = ^Po(l-Po)+ ^ ^ ^ [pi(1-r)-Po(1-Po)]- 



In any case, 



max (Vari(M^), Varo(W^)) < ^A^^po + \r?{V\-Vo)- 



Recalling the definition of i?^/ in (73), under (13) we have 

n(n - l)(pi -po) n'^pi-po 

UW > = ^ -TT — > OO. 

^Af% + n2(pi -po) N ^ 
Therefore, the total degree test is powerful when (13) holds. 



7.6 Proof of Proposition 3 

We use the union bound, Chernoff 's bound (28) and (30) to get 



lPo(W^f„] > a?^^^^) < (^^^ exp (-n(2)if(, 

< exp ( nlog(A^e/n) — ?i''^''if( 



which goes to zero when 



log{N/n) - ^ ^ ' H{a) ^ -oo. (74) 



Choose a = rjpo + (1 — rfjpi with r] G (0, 1) fixed, sufficiently small that 

. ^ nHia) 
hmmf / , , > 1. 

21og(Af/n) 

This is possible because of how H varies, which is described in Lemma 3. We then consider the 
test that rejects when VFj^j > an^'^\ We just chose a so that its level tends to zero. Under the 

alternative, let S denote the community. By definition, VFj^j > Ws, and since Ws ~ YMTi{rS^\pi) 

and pin^2) — )• oo, Ws = pin^"^^ + Op(\/pi"^)- Therefore, the test is powerful when pi — a ^ 
y/pinJ^. Since pi — a = r]{pi — po) and > is constant, this is the same as {pi —po)n'^ ^ \J p\v? . 
Now, if pi/po is bounded away from 1, this is true because pi — po pi and pin? — )• oo; while if 
Pi/Po ~^ 1) we use Lemma 3 and (15) to get that {pi — Po)'^n/pQ > est log(A^/n), implying that 
{Pi - PoW/ -sjpin^ ~ (Pi - PQ)n/pQ oo. 
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7.7 Proof of Proposition 4 

The arguments are based on cumbersome, but pedestrian moment calculations. 

Under the nuU. We first show that V* remains bounded under the null. Rewrite V as 



V 



N 



^ E (Wi. -{N- l)pof + {po - Po)\N - 1 



i=l 

+ {po-Po){N-l) 



N{N-1) iV(2) 



+ 



iV(2) - 1 



iV(2) -1 



-l + 2po)-(iV-l) 



iV(2) 



iV(2) - 1 



Po(l -Po) 



(75) 



Since Eo(po - po? = [Ni^^r^l - po) and Eo{W,. - {N - l)po)^ = {N - l)po{l - po), it follows 
that E,oV = 0. For the variance, we have 



Varo [{po-Po? 



^ 2po^(l -Po)^ ^ Po{l - Po 



and 



Varo 



N 



{Wi. -{N- l)po 



_i=l 

Hence, we get 



(iV(2)2 (Ar(2))3 
= 2iV(iV-l) [(7V-3)po'(l-po)'+Po(l-po)bo^ + (l-po)^]] . 

Varo(y) -< Npo'^+po -< Npo'^ , 



since pq >- 1/N. Therefore, by Chebyshev's inequality, V = Op{y/Npo). Under the null, N^'^^po = 
W - Bm{N^^\po), and because iV(2) Po — >• oo, we have po > ^po with probability tending to 1 as 
N ^ oo. We conclude that, under the null, V* = Op{l). 

Under the alternative. Turning to the alternative hypothesis, we shall prove that V* tends 
to infinity with high probability by showing that K'i{V) ^ ^/NpQ V ^J\aI'l(y) since ^/NpQ = 
Op'^ (\/iVpo)- The expression (75) of V still holds. 

By definition of pq = p'q + n^^^ /N^"^^ [pi — pg), we have E']^(po) = Po- Furthermore, 



E'i[(po-po)': 

Y^(W,.-{N-l)p,) 



rPo(l -Po) 



n 



N 



■4 = 1 



' J iV(2) 

iV(iV-l)po(l-Po) ~ n^(pi-Po) 



/ \2 



Inputing this into (75), we get 



K[v]-{pi-p'of 



N 



(76) 



By (22), ¥!^[V] » ^/Np'^ and ¥.'^[V] » n'^/N^I'^{pi - p'^) and it follows that ¥.'^[V] > ^/Npq. To 
conclude, we need to control the variance of V under P'j^. Tedious computations lead us to 



Var'i [po - Pq] < 



Po 

iV2^ 



Var'i [(j5o - po) 



Var'i 



N 



(Wi. -{N- l)po 



21 . ^ , ^ and 
Ar4 + Are ' ^"^^ 



-< N'po + iV^po^ + n^(pi - Po)' + n^iVpo(pi - p'o)' + n*(pi - Po) 



1=1 



' ^2 



/ \2 



J ^3 
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so that, using the fact that po >- 1/N, we get 

S 4 

Var'jy] -< A^po' + ^Po(pi - p'of + J^^Pi - Pof ■ 
We conclude that K[[V] > ^/Yar[{V) as soon as the following cconditions are met 

3 3/2 

We already argued that the first one holds, while the second and third are easily seen to be implied 
by the first condition and the fact that po ^ 1/N. 

7.8 Proof of Proposition 5 

It suffices to show that the scan test is asymptotically powerful for Hq versus Hi, where the model 
under Hq is G{N,pq). In view of Proposition 3, it is therefore enough to prove that, under (23), 
we have 

liminf^^^#\>l. 

21og(Af/n) 

First note that po = W/N^"^^ is concentrated around its mean. Indeed, we have 
EIN'-^^Po] = (iV(2) _n(2))po + n(2)pi = n'^'^^pq + n'^^Xpi -po), 

and 

Var[iV(2)po] = (Ar{2) _ n(2))po(l - p^) + n^^'^piil - pi) < E[N'^^^po]. 
Hence, by Chebyshev's inequality, 

1 

Po = Po + a + Op{ — ^/po + a) , a := j^{pi - po). 

Since po ^ N~'^, we have ^^/po/N = o{po). If a > po, then po ^ N^'^ impies that y/a/N = o{a). 
All in all, we get ^Jpo + a/N = o{po + a) and po po + a. As in the previous proofs, we can 
assume that pi/po — s- r G [l,oo]. In the three following case, we prove that 

nHf;^ {p^ ) 
liminf , , > 1. 

21og(Af/n) 

CASE 1: pi/po 1- In that case, we have a = o(po) and ^/po/N = o{pi — po) since 

^J^^^yH,M)y'-^^^^- 

Po n 
Hence, Po — Po = — Po) and we conclude that 



rr . N jPl - Pof ^ (Pl - PO? - - P0){P0 - Po) rr . ^ 



CASE 2: pi/po — >■ r G (1, oo). Hence, a = o{po) and po ~P Po- It follows that 

Hpoipi) ~p po(rlog(r) - r + 1) ~p Hp^{pi) . 
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CASE 3: Pi/pq — >■ oo. Since po ~p pQ + a, we derive 



HpoiPi) ~P Pi log ( ^] ) > Pi log ( A ~p Hp^,{pi) A 2pi log ( — 



It remains to prove that lim inf npi > 1 when liminf ni/(pi)/ log(A^/n) > 2. Assume that hm inf npi 
1 so that there exists a subsequence satisfying 

r ^1 ^ r ■ f log(Pi/Po) ^ „ 

hmnpi < 1 and hmminni- — - — r—— > 2. 

log(A/?i) 

It follows that liminf log(l/(npo))/log(A/n) > 2 and lim sup A^po/''^ !^ li which contradicts the 
assumption of the proposition. 



7.9 Proof of Proposition 6 

We prove the result when p^ is known. The situation when p^ is unknown can be dealt with in 
a similar way; see, for example, the proof of Proposition 5. Let B = W'^ . We first lower bound 
SDP„(VF^) from below under the alternative where S is the anomalous subset of indices. We have 

1 1 ^ 

n n ^ — ' ^ — ' 

i,je5fc=l 

We have 

1 ^ 

i,jeSfc=i 

= [(n - l)pi + (A - n)po] + (n - 1) [(n - 2)^? + (AT _ n)p2] , 

= (A - l)po + (n - l)(pi - po) + - 1)(A - 2)pI + (n - l)(n - 2)(p2 _p2)^ 

and, after some tedious but straightforward calculations, 
1 ^ 

al := -^\s.Ts ( Y^^^^^^i) = O (W^)Po(l -Po)[l + (npo)'] -pi)[l + (npi)^]) . 

i,jeSk=l 

By Chebyshev's inequality, under the alternative, SDP„(S) > ns — Op{as)- 

Under the null, we bound SDP„(S) from above as Berthet and Rigollet (2012) do. Specifically, 
they use a result of Bach et al. (2010), which says that 

SDP„(S) = minA'^^'^(S + ?7) + n|C/|oo, 

where the minimum is over symmetric matrices U = (Uij) and |?7|oo '■= niaxjj \Uij\. Similar to 
what Berthet and Rigollet (2012) do, we apply this identity toU = (Uij) with Uij = —Bijl^^Btj\<z}j 
obtaining 

SDP„(S) < A'"^"(T,(S)) + nz. 

where Tz{B) is the hard thresholding of B at threshold z, meaning the matrix with coefficient 
equal to 3^1^^^^^^^^}- Under the null, we have 
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and, when i ^ j, 

= Yl ^ikW^k ~ Bin(7V - 2,pI). 

k 

Fix e > 0. Using Bernstein's inequality (Lemma 2) and the union bound, we find that the 
fohowing inequahties happen simultaneously with probability tending to one under the null: 



max^ii < (iV-l)po + xo, := {I + e)2NpQ{l - pQ)\ogN + (1 + e) log(iV), 



maxS.j- < (iV - 2)p^ + xoo, xqo := 2 J {I + e)Npl{l - pD log N + 2{l + e)\og{N). 

Hence, choosing z = {N — 2)pq + xqo, we have 

SDP„(S) < C := (A^ - 1)P0 + xo + n(A^ - 2)pl + nxoo, 

with high probability under the null. In order to conclude, we need to prove that ^5 — 0{(Js) > C 
with probability going to one. 

Before proceeding, we note that (25) implies that, for some rj > 0, 



npi 



>2(l + 7?)poV^logiV , 



and (1) implies that either npQ > 1, or (N/n) " < npo < 1, for some sequence a — )• 0. In particular, 
this implies 

> npoy/Nlog{N) > y/N{n/Nf , 

so that n > A^/^-a^ n a,lso follows that n^/p^ > y/n{N/n)~°- —?■ 00. 
We have — C > (1 + o(l))n^pf - NpQ — xq - jixqo, with 



n 



"^pI 2{l + vi)npQ^JN\ogN n^b^ n'^ .JT^^ {N / nf ^ r - 

> — 7= > 7= > V log iV 00 , 



Npl Npl ViVpo VN 

Xn 11 
— ^ < — + ^ 



nxoo n^pQ n 



nxQo ^ 2n^/{l + e)NppogN + 2(1 + e)nlog{N) _ 1+g , / (l + £)logiV _ 1 + ^ , 
n2p2 - 2(1 + 7?)npo\/iVbgiV ~ 1 + ?? y (1 + T?)iVpo ~ 1 + ^ 

since iVpg > Nn~'^{N/n)~'^^ > iV^'^^"^ with 2t — 2a — ^ 2t > 0. Assuming that r/ > e, it remains to 
show that n'^pf ^ (75 to prove that ns — 0{as) > C with probability going to one in the asymptote. 
We have ag x Npo/n + Nnp^ + v?'p\, and 

ri^pl 



VNpoJ 



n 



y n^/p^^ n log iV 00 , 



since nJp^ — )• 00, and also 



and 



7=4^ ^ ^/7ilog(iV)/po ^ 00 



— ny^pi > npi 00 . 
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7.10 Proof of Proposition 7 

The first results follows from a simple consequence of Bernstein's inequality for binomial random 
variables and the union bound. Details are omitted. Let us concentrate on the second bound. It 
suffices to prove that with probability Pg going to one maxjgg VFj. < maxjg^c Wi, since the distri- 
bution of maxjg5c Wj. under Pg is stochastically smaller than the distribution of maxj=i^,,,^7v 
under Pq. Since limsuplog(n)/log(A^) < 1, we can assume that 7i < N^^"^ for some e > 0. Con- 
dition (1) also enforces pQ ^ log(A^)/A^. Since the power of the maximal degree test is increasing 
with respect to pi, we can assume that pi satisfies Condition (27) but is still large enough so that 
pi S> log(n)/n. 

Fix 5 > arbitrarily small. Applying Berntein's inequality (Lemma 2) and using NpQ S> log(A^) 
and npi ^ log(n), we derive that 

ma^Wi. - {N - l)po + (n - l)(pi - po) 



< ^2(1 + 6)iN- l)po(l - po) log(n) + ^(2 + 5)npi{l - pi) log(n) 

< ^2(1 + m- - l)po(l - Po) log(n)(l + o(l)) . (77) 



with probability going to one since we assume that n{pi — po) = o{y^ N log(A^)po) = o{Npq). 

Let us consider a consider a subset T C S''^ of size N^~'^ with some k > 0. As the Wi. are 
not independent, it is not straightforward to directly lower bound their supremum. This is why 
we compare it to independent variables. Let us call the the smallest i in T that achieves 
maxjgT Y^jeT'^ 

max Wi. > max Wi. > max > Wi + > Wi* . 

Observe that that the first term is supremum of |T| independent binomial variables and that the 
second term follows a binomial distribution with parameters p and |T|. With probability going 
to one, we have "Yj^^T^ird — I-^Ipo — -\/|T|po(l — Po) log(|T|). Let us turn to the supremum of 
independent binomial distributions. We start from P(Bin(re,p) = k) = p^{l — pY~^(X)- Consider p 
bounded away from 1 and k > np such that k/n is also bounded away from one. Using the Stirling 
formula V2vrn(n/e)" < n! < ^/2^m{n/e)^e^^^^'^^\ we get 

F{Bm{n,p) = k + i) > exp[-nHp{k/n)] ^ fp{l-k/n 



P(Bin(n,p) > /c) >- exp[-nHp{k/n)] 



2^/2^ \k/n{l-p)J V k + i 
1 k/n{l — p) 



^ k/n-p 



p{l — kju} ^ ^ 



k/n{l — p) 



where we have summed the first inequality for i = 0, . . . , ^/k — 1. Applying this lower bound to 
Y^ji^j'c Wij and using Lemma 3, we derive that 



5^ W,,, > (AT - 1 - \T\)po + ^2(1 -5){N- \T\ - l)po(l - Po) log(|r|) 



12^11-5 



V(l-5)log(|T| 

Since the random variables X^jg-pc Wij for i G T are independent, it follows that 

sup J2 ^id >{N-l- \T\)po + v/2(l -S){N- \T\ - l)po(l - Po) log(|T|) , 
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with probability going to one. All in all, we derive that with probability going to one 



maxVF,. >{N- l)po + \/2(l - 5){N - l)po(l - Po)(l - log(iV) - 2V2Afi-«Po(l - Po) log(iV) , 

where the last term is negligible in front of the second term. Comparing this last lower bound with 
(77) and taking k and 5 small enough allows us to conclude. 

7.11 Proof of Proposition 8 

By Chebyshev's inequality ^(V) ~Po Npq/2 with probability going to one. Since h{S) < \S\/2, 
we have h{S) < Npo/A for all subsets S of size smaller than Npo/2 — t- oo. Note that \S\h{S) ~ 
Bin(|S'|'^^'',Po)- Applying Bernstein inequality (Lemma 2) and Lemma 4 to all subsets S of size 
larger than Npo/2, we derive that 



\S\hiS) < \^p, + J|5|3po(l -Po)log (^) + \S\log 



2 ' Y ' " ' \\S\ J ' ' " V l-^l 
with probability larger than 1 — exp(— A'^po/S). Comparing h{S) with Npo/2, we get 



^ < ^ + 2,M, ^ + '°g(iV/|S|) + = 1^ + 2, Mod) + »(1) , 

Npo - N \ N\l Npo Npo N \ N ^ ^ ^ ' 

since Npo ^ log(A^). This quantity is away from one, except if jS"! ~ A^. As a consequence, 
maxs h{S) hiV) ~Po Npq/2 with probability going to one. 

Let us turn to the alternative distribution. Under P5, \S\h{S) ~ BindSl^^^pi). It follows that 
h{S) ~Pg npi/2 with probability going to one. The densest subgraph test is therefore powerful 
when liminf > 1. 

Let us now assume that — >■ 0. For any subset T, \T\h(T) is the sum of two independent 

binomial distributions of parameters {\S r\T\'^'^\pi) and (|r|*^^) — |5 n r|*^^),po)- Applying, as 
previously, Bernstein's inequality for all subsets T of size larger than Npq/2, we derive that 



\T\h{T) < -^— + (pi -Po) + Wl^rPolog ( ^ 1 + |T|log I — — 



+j2|Tn5|3pilog( — ) +2|5nT|log ^ 



ISTiTiviy ' ' ''Visnrivi 

with probability going to one. Comparing h{T) with Npq/2 we get 



Npo - N N Po ' ' y Npo 

Since we assume that npi = o{Npo), this quantity is away from one except if |r| ~ A^. 
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